Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
This paper studies the problem of data-driven combined longitudinal and lateral control of autonomous vehicles (AVs) such that the AV can stay within a safe but minimum distance from its leading vehicle and, at the same time, in the lane. Most of the existing methods for combined longitudinal and lateral control are either model-based or developed by purely data-driven methods such as reinforcement learning. Traditional model-based control approaches are insufficient to address the adaptive optimal control design issue for AVs in dynamically changing environments and subject to model uncertainty. Moreover, the conventional reinforcement learning approaches require a large volume of data, and cannot guarantee the stability of the vehicle. These limitations are addressed by integrating the advanced control theory with reinforcement learning techniques. To be more specific, by utilizing adaptive dynamic programming techniques and using the motion data collected from the vehicles, a policy iteration algorithm is proposed such that the control policy is iteratively optimized in the absence of the precise knowledge of the AV’s dynamical model. Furthermore, the stability of the AV is guaranteed with the control policy generated at each iteration of the algorithm. The efficiency of the proposed approach is validated by SUMO simulation, a microscopic traffic simulation platform, for different traffic scenarios.more » « lessFree, publicly-accessible full text available May 1, 2026
-
This paper proposes a novel learning-based adaptive optimal controller design method for a class of continuous-time linear time-delay systems. A key strategy is to exploit the state-of-the-art reinforcement learning (RL) techniques and adaptive dynamic programming (ADP), and propose a data-driven method to learn the near-optimal controller without the precise knowledge of system dynamics. Specifically, a value iteration (VI) algorithm is proposed to solve the infinite-dimensional Riccati equation for the linear quadratic optimal control problem of time-delay systems using finite samples of input-state trajectory data. It is rigorously proved that the proposed VI algorithm converges to the near-optimal solution. Compared with the previous literature, the nice features of the proposed VI algorithm are that it is directly developed for continuous-time systems without discretization and an initial admissible controller is not required for implementing the algorithm. The efficacy of the proposed methodology is demonstrated by two practical examples of metal cutting and autonomous driving.more » « lessFree, publicly-accessible full text available January 1, 2026
-
The majority of the past research dealing with lane-changing controller design of autonomous vehicles (𝐴𝑉 s) is based on the assumption of full knowledge of the model dynamics of the 𝐴𝑉 and the surrounding vehicles. However, in the real world, this is not a very realistic assumption as accurate dynamic models are difficult to obtain. Also, the dynamic model parameters might change over time due to various factors. Thus, there is a need for a learning-based lane change controller design methodology that can learn the optimal control policy in real time using sensor data. In this paper, we have addressed this need by introducing an optimal learningbased control methodology that can solve the real-time lane-changing problem of 𝐴𝑉 s, where the input-state data of the 𝐴𝑉 is utilized to generate a near-optimal lane-changing controller by approximate/adaptive dynamic programming (ADP) technique. In the case of this type of complex lane-changing maneuver, the lateral dynamics depend on the longitudinal velocity of the vehicle. If the longitudinal velocity is assumed constant, a linear parameter invariant model can be used. However, assuming constant velocity while performing a lane-changing maneuver is not a realistic assumption. This assumption might increase the risk of accidents, especially in the case of lane abortion when the surrounding vehicles are not cooperative. Thus, in this paper, the dynamics of the 𝐴𝑉 are assumed to be a linear parameter-varying system. Thus we have two challenges for the lane-changing controller design: parameter-varying, and unknown dynamics. With the help of both gain scheduling and ADP techniques combined, a learning-based control algorithm that can generate a near-optimal lane-changing controller without having to know the accurate dynamic model of the 𝐴𝑉 is proposed. The inclusion of a gain scheduling approach with ADP makes the controller applicable to non-linear and/or parameter-varying 𝐴𝑉 dynamics. The stability of the learning-based gain scheduling controller has also been rigorously proved. Moreover, a data-driven lane-changing decision-making algorithm is introduced that can make the 𝐴𝑉 perform a lane abortion if safety conditions are violated during a lane change. Finally, the proposed learning-based gain scheduling controller design algorithm and the lane-changing decision-making methodology are numerically validated using MATLAB, SUMO simulations, and the NGSIM dataset.more » « less
-
This paper studies the effect of perturbations on the gradient flow of a general nonlinear programming problem, where the perturbation may arise from inaccurate gradient estimation in the setting of data-driven optimization. Under suitable conditions on the objective function, the perturbed gradient flow is shown to be small-disturbance input-to-state stable (ISS), which implies that, in the presence of a small-enough perturbation, the trajectories of the perturbed gradient flow must eventually enter a small neighborhood of the optimum. This work was motivated by the question of robustness of direct methods for the linear quadratic regulator problem, and specifically the analysis of the effect of perturbations caused by gradient estimation or round-off errors in policy optimization. We show small-disturbance ISS for three of the most common optimization algorithms: standard gradient flow, natural gradient flow, and Newton gradient flow.more » « less
-
This paper proposes a novel robust reinforcement learning framework for discrete-time linear systems with model mismatch that may arise from the sim-to-real gap. A key strategy is to invoke advanced techniques from control theory. Using the formulation of the classical risk-sensitive linear quadratic Gaussian control, a dual-loop policy optimization algorithm is proposed to generate a robust optimal controller. The dual-loop policy optimization algorithm is shown to be globally and uniformly convergent, and robust against disturbances during the learning process. This robustness property is called small-disturbance input-to-state stability and guarantees that the proposed policy optimization algorithm converges to a small neighborhood of the optimal controller as long as the disturbance at each learning step is relatively small. In addition, when the system dynamics is unknown, a novel model-free off-policy policy optimization algorithm is proposed. Finally, numerical examples are provided to illustrate the proposed algorithm.more » « less
-
Risk sensitivity is a fundamental aspect of biological motor control that accounts for both the expectation and variability of movement cost in the face of uncertainty. However, most computational models of biological motor control rely on model-based risk-sensitive optimal control, which requires an accurate internal representation in the central neural system to predict the outcomes of motor commands. In reality, the dynamics of human-environment interaction is too complex to be accurately modeled, and noise further complicates system identification. To address this issue, this paper proposes a novel risk-sensitive computational mechanism for biological motor control based on reinforcement learning (RL) and adaptive dynamic programming (ADP). The proposed ADP-based mechanism suggests that humans can directly learn an approximation of the risk-sensitive optimal feedback controller from noisy sensory data without the need for system identification. Numerical validation of the proposed mechanism is conducted on the arm-reaching task under divergent force field. The preliminary computational results align with the experimental observations from the past literature of computational neuroscience.more » « less
-
This paper presents a novel learning-based adaptive optimal controller design for linear time-delay systems described by delay differential equations (DDEs). A key strategy is to exploit the value iteration (VI) approach to solve the linear quadratic optimal control problem for time-delay systems. However, previous learning-based control methods are all exclusively devoted to discrete-time time-delay systems. In this article, we aim to fill in the gap by developing a learning-based VI approach to solve the infinite-dimensional algebraic Riccati equation (ARE) for continuous-time time-delay systems. One nice feature of the proposed VI approach is that an initial admissible controller is not required to start the algorithm. The efficacy of the proposed methodology is demonstrated by the example of autonomous driving.more » « less
-
In this paper, we study the robustness property of policy optimization (particularly Gauss–Newton gradient descent algorithm which is equivalent to the policy iteration in reinforcement learning) subject to noise at each iteration. By invoking the concept of input-to-state stability and utilizing Lyapunov’s direct method, it is shown that, if the noise is sufficiently small, the policy iteration algorithm converges to a small neighborhood of the optimal solution even in the presence of noise at each iteration. Explicit expressions of the upperbound on the noise and the size of the neighborhood to which the policies ultimately converge are provided. Based on Willems’ fundamental lemma, a learning-based policy iteration algorithm is proposed. The persistent excitation condition can be readily guaranteed by checking the rank of the Hankel matrix related to an exploration signal. The robustness of the learning-based policy iteration to measurement noise and unknown system disturbances is theoretically demonstrated by the input-to-state stability of the policy iteration. Several numerical simulations are conducted to demonstrate the efficacy of the proposed method.more » « less
-
N. Matni, M. Morari (Ed.)In this paper, we propose a robust reinforcement learning method for a class of linear discrete-time systems to handle model mismatches that may be induced by sim-to-real gap. Under the formulation of risk-sensitive linear quadratic Gaussian control, a dual-loop policy optimization algorithm is proposed to iteratively approximate the robust and optimal controller. The convergence and robustness of the dual-loop policy optimization algorithm are rigorously analyzed. It is shown that the dual-loop policy optimization algorithm uniformly converges to the optimal solution. In addition, by invoking the concept of small-disturbance input-to-state stability, it is guaranteed that the dual-loop policy optimization algorithm still converges to a neighborhood of the optimal solution when the algorithm is subject to a sufficiently small disturbance at each step. When the system matrices are unknown, a learning-based off-policy policy optimization algorithm is proposed for the same class of linear systems with additive Gaussian noise. The numerical simulation is implemented to demonstrate the efficacy of the proposed algorithm.more » « less
-
In this paper, we address the problem of model-free optimal output regulation of discrete-time systems that aims at achieving asymptotic tracking and disturbance rejection when we have no exact knowledge of the system parameters. Insights from reinforcement learning and adaptive dynamic programming are used to solve this problem. An interesting discovery is that the model-free discrete-time output regulation differs from the continuous-time counterpart in terms of the persistent excitation condition required to ensure the uniqueness and convergence of the policy iteration. In this work, it is shown that this persistent excitation condition must be carefully established in order to ensure the uniqueness and convergence properties of the policy iteration.more » « less
An official website of the United States government

Full Text Available